Add common cache and per-build cache#62
Conversation
|
Interesting! What is the advantage of the common cache? It seems like it would be better for each cache item to be attributable to a specific package. |
|
I'm sure there are other cases, but in this particular use, common cache is helpful for descriptors that fetch their own sources. It's used in the Bitcoin overhaul because there's a buildsystem shared between gitian and the pull-tester. Since the buildsystem fetches and verifies its own sources anyway, there's no need to include them as gitian inputs. And since several descriptors share sources, it'd be senseless to fetch them for each one. |
|
A couple of comments:
I'm puzzled by "Since the buildsystem fetches and verifies its own sources anyway, there's no need to include them as gitian inputs". What is the downside to having the pull-tester place the sources in the inputs directory? |
|
1: I agree, but this work is a bit outside the box. I'll try to show below why I went this route. Neither of those are desirable, but they were sacrifices I made in order to unify things. Would you mind giving the description of bitcoin/bitcoin#4592 a quick read? I'd like to give a real example of how all of this ties together. There's a lot going on, so I'll try to summarize as briefly as possible (hint: it won't actually be brief ;) In the past, for Bitcoin, there's been a disconnect between what devs run, what the pull-tester tests, and what Gitian builds. I've attempted to unify those things so that the pull-tester is able to bulid/test exactly what Gitian will produce, minus the deterministic guarantees. To do this, i created a build-system for them to share. This build-system builds all dependencies as-needed and caches individual results. So if libfoo's build-recipe (or the build-system itself) hasn't changed since the last run, it won't be rebuilt. Instead, it will just be unpacked. This system is deterministic in its own right... the details are a bit complex, but you can assume that to be true. With that done, the pull-tester and Gitian can store the build-results and reuse them, rather than rebuilding each dependency every time. See here for how this is actually happening: pull-tester: https://github.com/coryfields/bitcoin/blob/master/.travis.yml Note how both of them call "make -C depends", then use those results to build bitcoin. The result is that our gitian descriptors can stay static... we don't have to sync them up with anything, and yet we know that they'll build the same thing that the pull-tester did for any particular commit. If a dependency needs to be changed, it's changed in the dependency builder. So, all that said, here's an example of it in action: Notice that the new version of qrencode was built/fetched/installed. Since nothing else depends on qrencode, nothing else had to be rebuilt. If (for example) qt had depended on qrencode, it would've been rebuilt as well against the new qrencode. If I use gitian to build that commit, the exact same thing will happen. Any cached results from previous builds will be used so that only qrencode will have to rebuild. The end-result is a guarantee that gitian will build exactly what the c-i is building, with no (or very little, i hope) chance of deviation. So this commit is all it takes for us to bump that dependency, have it built/verified, and have it present in a release. In less than 10 minutes. So... I would very much like to maintain that behavior. Imo, it's a huge feature for us. However, the 2-cache system is admittedly very kludgy. Do you have any suggestions on how it could be done more elegantly? |
|
To clarify, in case I didn't above, the pull-tester and Gitian know nothing of each-other. The pull-tester runs automatically via some cloud magic, and devs use Gitian manually. I realize that it may read as though the pull-tester is using (or aware of) Gitian in some way, but that's not the case. The common factor is the dependency builder. |
|
@devrandom Any thoughts on the above? The bitcoin dependency builder is nearly merge-ready, and I'd like to have a plan for dealing with Gitian. |
|
I think a build artifact cache is likely to be a good direction. However, I'd like to make sure it's clear how to use it. Would it be possible to articulate exactly how each type of cache is meant to be used? |
|
Sure. I'll describe exactly how I've used it for bitcoin, though I'm sure there are other use-cases. Before the cache:Descriptor 1: Windows
Descriptor 2: Windows
Descriptor 3: Linux
Descriptor 4: Linux
Process: User builds descriptors 1 and 2, saves the outputs, copies them to inputs, then builds descriptors 3 and 4. With the cache:Descriptor 1: Windows
Descriptor 2: Linux
Process: User builds descriptors 1 and 2. If cached versions of libfoo are found from previous gitian runs, they will be used instead of rebuilding. Note that the logic to determine if a cached version can be reused is not handled here, that's up to the user to work out. libfoo-source.tar.gz is only fetched once because when the 2nd descriptor is run, the 1st descriptor will have already put it in the global cache. |
|
Okay, putting a concise version of this in the doc directory would be helpful. I think we can recommend that sources go in the common cache, and binary build artifacts go in the build-specific cache. I just realized that the gitian build process can be run without a network connection if the cached sources are present. So I can go ahead and accept the pull request if you add the docs unless you have further thoughts. |
|
Yea, the cache can be pre-seeded to mimic the use of inputs. I was tempted to use the inputs dir itself for the global cache, but I think that might lead to some nasty accidents. I'll do up some docs. Thanks for hearing me out! |
|
@devrandom added a quick readme. |
Allow each builder to cache some files for re-use in the next build. This allows for poor-man's dependency chaining. Additionally, add a common cache pool for all builds. This can be used for saving (for example) downloaded files to be shared between builds.
Add common cache and per-build cache
This is merely a POC to start a discussion. I'm sure there's a nicer way of achieving the same thing.
Allow each builder to cache some files for re-use in the next build. This allows for poor-man's dependency chaining.
Additionally, add a common cache pool for all builds. This can be used for saving (for example) downloaded files to be shared between builds.
Needed for the Bitcoin build process overhaul. I'll link the PR for discussion once it's posted.